Tensorflow Project Exercise

Let's wrap up this Deep Learning by taking a a quick look at the effectiveness of Neural Nets!

We'll use the Bank Authentication Data Set from the UCI repository.

The data consists of 5 columns:

  • variance of Wavelet Transformed image (continuous)
  • skewness of Wavelet Transformed image (continuous)
  • curtosis of Wavelet Transformed image (continuous)
  • entropy of image (continuous)
  • class (integer)

Where class indicates whether or not a Bank Note was authentic.

This sort of task is perfectly suited for Neural Networks and Deep Learning! Just follow the instructions below to get started!

Get the Data

Use pandas to read in the bank_note_data.csv file


In [1]:
import pandas as pd

In [3]:
data = pd.read_csv('bank_note_data.csv')

Check the head of the Data


In [61]:
data.head()


Out[61]:
Image.Var Image.Skew Image.Curt Entropy Class
0 3.62160 8.6661 -2.8073 -0.44699 0
1 4.54590 8.1674 -2.4586 -1.46210 0
2 3.86600 -2.6383 1.9242 0.10645 0
3 3.45660 9.5228 -4.0112 -3.59440 0
4 0.32924 -4.4552 4.5718 -0.98880 0

EDA

We'll just do a few quick plots of the data.

Import seaborn and set matplolib inline for viewing


In [67]:
import seaborn as sns
%matplotlib inline

Create a Countplot of the Classes (Authentic 1 vs Fake 0)


In [68]:
sns.countplot(x='Class',data=data)


Out[68]:
<matplotlib.axes._subplots.AxesSubplot at 0x130bde4a8>

Create a PairPlot of the Data with Seaborn, set Hue to Class


In [69]:
sns.pairplot(data,hue='Class')


Out[69]:
<seaborn.axisgrid.PairGrid at 0x1313429e8>

Data Preparation

When using Neural Network and Deep Learning based systems, it is usually a good idea to Standardize your data, this step isn't actually necessary for our particular data set, but let's run through it for practice!

Standard Scaling

**


In [71]:
from sklearn.preprocessing import StandardScaler

Create a StandardScaler() object called scaler.


In [72]:
scaler = StandardScaler()

Fit scaler to the features.


In [73]:
scaler.fit(data.drop('Class',axis=1))


Out[73]:
StandardScaler(copy=True, with_mean=True, with_std=True)

Use the .transform() method to transform the features to a scaled version.


In [74]:
scaled_features = scaler.fit_transform(data.drop('Class',axis=1))

Convert the scaled features to a dataframe and check the head of this dataframe to make sure the scaling worked.


In [77]:
df_feat = pd.DataFrame(scaled_features,columns=data.columns[:-1])
df_feat.head()


Out[77]:
Image.Var Image.Skew Image.Curt Entropy
0 1.121806 1.149455 -0.975970 0.354561
1 1.447066 1.064453 -0.895036 -0.128767
2 1.207810 -0.777352 0.122218 0.618073
3 1.063742 1.295478 -1.255397 -1.144029
4 -0.036772 -1.087038 0.736730 0.096587

Train Test Split

Create two objects X and y which are the scaled feature values and labels respectively.


In [79]:
X = df_feat

In [80]:
y = data['Class']

Use the .as_matrix() method on X and Y and reset them equal to this result. We need to do this in order for TensorFlow to accept the data in Numpy array form instead of a pandas series.


In [81]:
X = X.as_matrix()
y = y.as_matrix()

Use SciKit Learn to create training and testing sets of the data as we've done in previous lectures:


In [45]:
from sklearn.cross_validation import train_test_split

In [46]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

Contrib.learn

Import tensorflow.contrib.learn.python.learn as learn


In [82]:
import tensorflow.contrib.learn.python.learn as learn

Create an object called classifier which is a DNNClassifier from learn. Set it to have 2 classes and a [10,20,10] hidden unit layer structure:


In [83]:
classifier = learn.DNNClassifier(hidden_units=[10, 20, 10], n_classes=2)

Now fit classifier to the training data. Use steps=200 with a batch_size of 20. You can play around with these values if you want!

Note: Ignore any warnings you get, they won't effect your output


In [94]:
classifier.fit(X_train, y_train, steps=200, batch_size=20)


/Users/marci/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py:1197: VisibleDeprecationWarning: converting an array with ndim > 0 to an index will result in an error in the future
  result_shape.insert(dim, 1)
Out[94]:
DNNClassifier()

Model Evaluation

Use the predict method from the classifier model to create predictions from X_test


In [95]:
note_predictions = classifier.predict(X_test)

Now create a classification report and a Confusion Matrix. Does anything stand out to you?


In [96]:
from sklearn.metrics import classification_report,confusion_matrix

In [97]:
print(confusion_matrix(y_test,note_predictions))


[[237   0]
 [  1 174]]

In [98]:
print(classification_report(y_test,note_predictions))


             precision    recall  f1-score   support

          0       1.00      1.00      1.00       237
          1       1.00      0.99      1.00       175

avg / total       1.00      1.00      1.00       412

Optional Comparison

You should have noticed extremely accurate results from the DNN model. Let's compare this to a Random Forest Classifier for a reality check!

Use SciKit Learn to Create a Random Forest Classifier and compare the confusion matrix and classification report to the DNN model


In [99]:
from sklearn.ensemble import RandomForestClassifier

In [100]:
rfc = RandomForestClassifier(n_estimators=200)

In [101]:
rfc.fit(X_train,y_train)


Out[101]:
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=200, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)

In [102]:
rfc_preds = rfc.predict(X_test)

In [103]:
print(classification_report(y_test,rfc_preds))


             precision    recall  f1-score   support

          0       1.00      0.98      0.99       237
          1       0.98      0.99      0.99       175

avg / total       0.99      0.99      0.99       412


In [104]:
print(confusion_matrix(y_test,rfc_preds))


[[233   4]
 [  1 174]]

It should have also done very well, but not quite as good as the DNN model. Hopefully you have seen the power of DNN!

Great Job!